This work analyzes the effect of mobility restrictions on the COVID-19 spread for two cities, Montreal and Toronto. The data on Montreal is from INSPQ and obtained by using an unofficial API that powers the INSPQ’s own dashboard (see: https://github.com/SimonCoulombe/covidtwitterbot). Otherwise the data is not publicly available. A limited version of the data on Toronto is publicly available and obtained from the Ontario government. We used these publicly available data to extract daily frequencies in Toronto.
This study aims to capture the dynamic relationship between mobility restrictions and the spread. We use positivity rates that reflect the spread and all_day_bing_tiles_visited_relative_change from Facebook, which measures positive or negative change in movement relative to baseline in Montreal and Toronto.
There are two metrics that we developed to measure the effect of social mobility on the spread: the correlation that reflects the overall effectiveness mobility restrictions and the average delay in the effect of these restrictions that reflects how efficient the contact tracing is when the community spread rises.
setwd("~/Dropbox/RNS2/Quebec")
# library(RCurl)
# URL <- "https://www.inspq.qc.ca/sites/default/files/covid/donnees/covid19-hist.csv"
# x <- getURL(URL)
# out <- read.csv(textConnection(x))
# saveRDS(out, file = "quebec0204.rds")
out <- readRDS(file = "quebec0204.rds")
head(out[1:6])
## ï..Date Regroupement Croisement Nom
## 1 Date inconnue Région RSS01 Bas-Saint-Laurent
## 2 Date inconnue Région RSS02 Saguenay–Lac-Saint-Jean
## 3 Date inconnue Région RSS03 Capitale-Nationale
## 4 Date inconnue Région RSS04 Mauricie et Centre-du-Québec
## 5 Date inconnue Région RSS05 Estrie
## 6 Date inconnue Région RSS06 Montréal
## cas_cum_lab_n cas_cum_epi_n
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
Cleaning the data and getting only Montreal:
# Removing "bad" dates and converting dates
data <- out[-c(1:43), ]
data$date <- as.Date(data$ï..Date, "%Y-%m-%d")
data <- data[, c(49, 2:48)]
row.names(data) <- 1:nrow(data)
#Changing Montreal's name
ind <- grep("Montr", data$Nom)
# Getting only Montreal
dfmontr <- data[ind, ]
dfmontr_v1 <- data[data$Croisement=="RSS06", ]
Here is the partial dictionary from https://github.com/SimonCoulombe/covidtwitterbot about the features in the data:
cas_cum_tot_n = number of confirmed cases (cumulative) according to declaration date
cas_cum_lab_n = number of laboratory-confirmed cases (cumulative) according to declaration date
cas_cum_epi_n = number of confirmed cases by epidemiological link (cumulative) according to declaration date
cas_quo_tot_n = number of confirmed cases (daily) according to declaration date
cas_quo_lab_n = number of laboratory-confirmed cases (daily) according to declaration date cas_quo_epi_n = number of cases confirmed by epidemiological link (daily) according to declaration date
act_cum_tot_n = number of active cases (today)
ret_cum_tot_n = number of recovered cases (cumulative
ret_quo_tot_n = number of recovered cases (daily)
dec_cum_tot_n = total deaths (cumulative)
dec_cum_chs_n = total deaths in CHSLDs (cumulative)
dec_cum_rpa_n = total deaths in RPA (cumulative)
dec_cum_dom_n = total deaths at home and unknown (cumulative)
dec_cum_aut_n = total deaths in RI and other (cumulative)
dec_quo_tot_n = total deaths (daily)
dec_quo_chs_n = total deaths in CHSLDs (daily)
dec_quo_rpa_n = total deaths in RPA (daily)
dec_quo_dom_n = total deaths at home and unknown (daily)
dec_quo_aut_n = total deaths in IR and other (daily)
hos_quo_tot_n = new hospitalizations (regular + intensive care)
hos_quo_reg_n = new non-intensive care hospitalizations
hos_quo_si_n = new intensive care hospitalizations
psi_cum_test_pos = cumulative confirmed cases
psi_cum_test_inf = disabled people (negative test) cumulative
psi_cum_test_n = cumulative number of people tested cumulative
psi_quo_test_pos = daily confirmed cases ## ATTENTION this column is empty for the last day
psi_quo_test_inf = negative daily (ATTENTION cet_test_inf) ## daily column is empty for the last day
psi_quo_test_n = cumulative number of people tested daily ## ATTENTION this column is empty for the last day
df <- dfmontr_v1[dfmontr_v1$date > "2020-02-29", ]
df <- df[-nrow(df),] #2/3 is zero
# Cases
df$psi_quo_pos_n <- as.numeric(df$psi_quo_pos_n)
cases <- df$psi_quo_pos_n # + tests
casedf <- data.frame(date = df$date, days = 1:length(cases), cases = cases)
# PR
df$psi_quo_pos_t <- as.numeric(df$psi_quo_pos_t)
pr <- df$psi_quo_pos_t #PR
prdf <- data.frame(date = df$date, days = 1:length(pr), pr = pr)
# Tests
df$psi_quo_tes_n <- as.numeric(df$psi_quo_tes_n)
test <- df$psi_quo_tes_n
tesdf <- data.frame(date = df$date, days = 1:length(pr), test = test)
# Smoothing with loess()
loepr <- loess(prdf$pr~prdf$days, degree=2, span = 0.01)
fitpr <- predict(loepr, prdf$days)
loecs <- loess(casedf$cases~casedf$days, degree=2, span = 0.01)
fitcs <- predict(loecs, casedf$days)
loetes <- loess(tesdf$test~tesdf$days, degree=2, span = 0.01)
fittes <- predict(loetes, tesdf$days)
#Plots
plot(cases, type = "h", col = "pink",
main = "Number of Positive Tests")
lines(casedf$days, fitcs/1.5, col = "red", lwd = 2)
plot(pr, type = "h", col = "lightblue",
main = "Positivity Rate - PR")
lines(prdf$days, fitpr/1.5, col = "darkblue", lwd = 2)
plot(test, type = "h", col = "lightgreen",
main = "Number of Tests")
lines(tesdf$days, fittes/1.5, col = "darkgreen", lwd = 2)
plot(cases, type = "h", col = "pink")
lines(casedf$days, fitcs/1.5, col = "red", lwd = 2)
lines(prdf$days, fitpr*40, col = "blue", lwd = 2)
It’s clear from these plots that the early numbers until day 100 are not usable due to very low tests numbers. Hence we will look at the epi curve after day 100.
library(haven)
mobdf <- read_dta("movement-range-data-2021-02-06/montrealmob.dta")[]
mob1 <- mobdf[1:nrow(df),1:2]
workingdf <- data.frame(Date = df$date, day = 1:nrow(df), dpr = prdf$pr, mob1$all_day_bing_tiles_visited_relat)
names(workingdf)[4] <- "mob1"
head(workingdf)
## Date day dpr mob1
## 1 2020-03-01 1 0.00 0.02466
## 2 2020-03-02 2 11.11 -0.05464
## 3 2020-03-03 3 0.00 -0.04480
## 4 2020-03-04 4 0.00 -0.04536
## 5 2020-03-05 5 0.00 -0.01491
## 6 2020-03-06 6 0.00 -0.04394
Here is a naive way of looking at the relationship by cross correlations:
library(astsa)
bbb <- ccf(workingdf$mob1, workingdf$dpr, na.action = na.pass)
x <- workingdf$mob1
y <- workingdf$dpr
lag2.plot (x, y, 11)
Cross correlations use the whole data with varying lags up to 11. As it’s clear from the plots, the correlation between mobility and PR is negative for all lags. In other words, as mobility goes down, the spread rises.
In order to calculate the dynamic nature of this relationship, we will calculate correlations on rolling windows of days. We create a grid search that finds the optimal lag and the width of windows to maximize the mean of positive correlations.
library(dplyr)
library(zoo)
lag = 1:21
w = 7:48
grid <- as.matrix(expand.grid(lag, w))
rol <- c()
for(i in 1:nrow(grid)) {
montl <- workingdf[2:nrow(workingdf), ] %>%
mutate(dprL = lead(dpr, grid[i,1]))
montll <- montl[complete.cases(montl), 4:5]
tmp <- rollapply(montll, width=grid[i,2], function(x) cor(x[,1],x[,2]),
by.column=FALSE)
sub <- tmp[100:length(tmp)]
#rol[i] <- mean(sub[sub > 0])
rol[i] <- mean(sub)
}
opt <- grid[which(rol == max(rol)), ]
montl <- workingdf[2:nrow(workingdf), ] %>%
mutate(dprL = lead(dpr, opt[1]))
montll <- montl[complete.cases(montl), 4:5]
roll <- rollapply(montll, width=opt[2], function(x) cor(x[,1],x[,2]),
by.column=FALSE)
loerc <- loess(roll~c(1:length(roll)), degree=2, span = 0.05)
fitrc <- predict(loerc, 1:length(roll))
plot(roll[100:length(roll)], type = "h", lwd = 2, col = "orange",
main = paste0("Rolling correlation with ", opt[1],
"-day delay and ", opt[2]/7,"-week window"),
xlab = "Days", ylab = "Correlation")
lines(fitrc[100:length(roll)], col = "darkgreen", lwd = 2)
lines(fitpr[100:length(roll)]/13, col = "darkblue", lwd = 2)
We removed the first 100 days, between March 1 and June 8, which is the period when only symptomatic people tested due to few available tests. The dark-blue line shows the smoothed (and scaled) PR numbers and the green line is the smoothed rolling correlations. There are two sections in the graph: between Day 111 (September 27) and 192 (December 17) and the periods before and after this section. Between September 27 and December 17, the correlation is always positive without cyclical positive and negative swings. Before September 27, the spread is subsided with very low PR numbers. During the second wave with increasing daily PR numbers, the correlation between mobility and PR becomes positive, stable, and substantial. Although conventional community spreads take about 14 days to develop symptoms, our grid search captures the most positive relationship between mobility and PR at Lag 2. This could be the result of an efficient contact tracing: when somebody had a positive test results, all his/her contacts are called for testing, which reduces the standard 14-day time in community spread.
But, why does it turn to be a negative correlation after December 17? That’s the point where the daily positivity rates spike very high. One reason would be a change in the delay between mobility and PR due to perhaps an increasing community spread that exceeds the earlier period, and therefore, the contact tracing would not anymore be so efficient. We will have another algorithm that targets the section after December 17 to see if any increasing delay would make that part positive:
library(dplyr)
library(zoo)
lag = 1:21
w = 7
grid <- as.matrix(expand.grid(lag, w))
rol <- c()
for(i in 1:nrow(grid)) {
montl <- workingdf[2:nrow(workingdf), ] %>%
mutate(dprL = lead(dpr, grid[i,1]))
montll <- montl[complete.cases(montl), 4:5]
tmp <- rollapply(montll, width=grid[i,2], function(x) cor(x[,1],x[,2]),
by.column=FALSE)
sub <- tmp[292:length(tmp)]
rol[i] <- mean(sub)
}
opt <- grid[which(rol == max(rol)), ]
montl <- workingdf[2:nrow(workingdf), ] %>%
mutate(dprL = lead(dpr, opt[1]))
montll <- montl[complete.cases(montl), 4:5]
roll <- rollapply(montll, width=opt[2], function(x) cor(x[,1],x[,2]),
by.column=FALSE)
loerc <- loess(roll~c(1:length(roll)), degree=2, span = 0.05)
fitrc <- predict(loerc, 1:length(roll))
plot(roll[292:length(roll)], type = "h", lwd = 2, col = "red",
main = paste0("Rolling correlation with ", opt[1],
"-day delay and ", opt[2]/7,"-week window"),
xlab = "Days", ylab = "Correlation",
ylim = c(-0.2, 1.1))
lines(fitrc[292:length(roll)], col = "green", lwd = 2)
lines(fitpr[292:length(roll)]/12, col = "darkblue", lwd = 2)
Now, as it’s clear from this graph, which magnifies the period after December 17, the maximum positive correlation can be obtained by Lag 11. This verifies the fact that a common lag should not be applied to the whole epidemiological curve. Instead, the delay between mobility and PR has to be tuned in each rolling window separately to see which day(s) in the past would have the maximum positive impact. This is what we will do now.
library(dplyr)
library(zoo)
lag = 1:21
w = 7
mco <- c()
lagg <- c()
mdf <- workingdf[2:nrow(workingdf), ]
n <- nrow(mdf)-w
rol <- c()
for(j in 1:n){
co <-c()
for(i in 1:length(lag)) {
mdf1 <- mdf %>% mutate(dprL = lead(dpr, lag[i]))
mdf2 <- mdf1[complete.cases(mdf1), 4:5]
k <- j-1
mdf3 <- mdf2[j:(w+k), ]
co[i] <- cor(mdf3[,1], mdf3[,2])
}
mco[j] <- max(co, na.rm = TRUE)
lagg[j] <- lag[which.max(co)]
}
roll <- mco[-length(mco)]
laga <- lagg[-length(lagg)]
loerc <- loess(roll~c(1:length(roll)), degree=2, span = 0.05)
fitrc <- predict(loerc, 1:length(roll))
plot(roll[100:length(roll)], type = "h", lwd = 2, col = "pink",
main = "Rolling correlation with tuned delay and 7-week window",
xlab = "Days", ylab = "Correlation",
ylim = c(0, 1.1))
lines(fitrc[100:length(roll)], col = "red", lwd = 2)
lines(fitpr[100:length(roll)]/12, col = "darkblue", lwd = 2)
abline(a = mean(roll[100:length(roll)]), b = 0, type = "n")
text(50, 1 , paste0("mean corr = ", mean(roll[100:length(roll)])))
As explained before, we ignore the first 100 days due to misleading PR values. The graph above shows the nonparametric estimation of rolling correlations (red) between mobility and PR. The average is very high around 77%. We will compare this correlation with Toronto later to see how effective the similar policies in two different big cities.
Now we look at how many days of delay between mobility and PR in each 7-day rolling window is identified by the algorithm.
loel <- loess(laga~c(1:length(laga)), degree=2, span = 0.05)
fitl <- predict(loel, 1:length(laga))
plot(laga[100:length(laga)], type = "h", lwd = 2, col = "lightblue",
main = "Average delay in the effect of mobility restrictions on the spread",
xlab = "Days", ylab = "Mobility Delays - Day",
ylim = c(1, 21))
lines(fitl[100:length(laga)], col = "red", lwd = 2)
lines(fitpr[100:length(roll)]*1.6, col = "darkblue", lwd = 2)
abline(a = mean(laga[100:length(laga)]), b = 0)
text(30, 20 , paste0("mean = ", round(mean(laga[100:length(laga)]),2),
" days"))
We can think that more immediate effects of mobility on PR happen when the effectiveness of contact tracing is high. When there is an increase in community spread without well identified sources, the effect of mobility on PR stretches back to its 14-day epidemiological symptom-transmission duration.
Meanwhile, when using time series data, there is a risk of taking the white noise as signal. Therefore, it is useful to test the significance of signals. To test the existence of signal against noise, Gershunov et al. 2001 proposed a Monte Carlo test approach based on the standard deviations of rolling correlations. The test below from dLagM package shows that none of the correlations are spurious. Although these results are based on a static rolling correlation structure with a common lag parameter for the whole period, our bootstrapping tests show that correlations are statistically significant.
library(dLagM)
montlts <- ts(montl[complete.cases(montl),])
RC <- rolCorPlot(montlts[,4], montlts[,5], width = c(7, 14, 21, 28, 35),
level = 0.95, main = NULL, SDtest = TRUE)
RC$test
## Width SDrolCor 95% 5%
## 1 7 0.4419783 0.24412588 0.10288274
## 2 14 0.3897478 0.13199048 0.05365409
## 3 21 0.3868039 0.10782743 0.03769225
## 4 28 0.3898378 0.08387235 0.02641335
## 5 35 0.3905363 0.06800692 0.01874170
The number of tests, percent of positive results by age and location and time taken for test processing are obtained from https://covid-19.ontario.ca/data/testing-volumes-and-results#byPHU. The total test numbers reflect 7-day averages of how many tests were processed by the lab in each day. The available information also starts (as of 02-04-2021) from May 1. 2021. Here is the original 7-day average data:
library (Matrix)
library (matrixcalc)
library(readr)
rm(list = ls())
df <- data.frame(read_csv("~/Dropbox/RNS2/DataON/Rinda/Toronto_test_data.csv"))
test <- matrix(df[,2])
pr <- matrix(df[,3])
# Smoothing with loess()
t = 1:266
loepr <- loess(pr~t, degree=2, span = 0.02)
fitpr <- predict(loepr, t)
loecs <- loess(test~t, degree=2, span = 0.02)
fitcs <- predict(loecs, t)
#Plots
plot(test, type = "h", col = "pink", main = "Tests performed (7-day average)",
xlab = "Days")
lines(t, fitcs/1.2, col = "red", lwd = 2)
plot(pr, type = "h", col = "lightblue", main = "7-day average PR",
xlab = "Days")
lines(t, fitpr/1.2, col = "darkblue", lwd = 2)
In order to extract the daily tests, positivity rate (PR), and positive results (dtest, dpr, and dpos, respectively) from 7-day moving averages, we applied a simple linear algebra:
A <- as.matrix (band (matrix (1, nrow=272, ncol=272), -3, 3)[-c(1:3, 270:272),])
Ai <- svd.inverse (A)
mytest <- 7 * Ai %*% test
mypr <- 7 * Ai %*% pr
dailytest <- ceiling(mytest[-c(1:6), ])
dailypr <- mypr[-c(1:6), ]
dailypr <- ifelse(dailypr < 0, 0.01, dailypr)
dff <- cbind(df, dtest = dailytest, dpr = dailypr)
dff$dpos <- round(dff$dtest*dff$dpr/100, 0)
head(dff, 10)
## Date Total.tests.processed Positive.results dtest dpr dpos
## 1 May 01 2020 2442 9.5 2805 12.539700 352
## 2 May 02 2020 2450 9.4 3524 6.654067 234
## 3 May 03 2020 2460 9.6 3437 8.538682 293
## 4 May 04 2020 2423 9.7 1942 8.413041 163
## 5 May 05 2020 2408 9.6 1607 9.938682 160
## 6 May 06 2020 2561 8.8 2192 7.102785 156
## 7 May 07 2020 2575 8.9 2522 9.113041 230
## 8 May 08 2020 2616 8.7 3092 11.139700 344
## 9 May 09 2020 2664 8.5 3860 5.254067 203
## 10 May 10 2020 2671 8.3 3486 7.138682 249
Here are the same graphs with daily numbers:
test <- dff[,4]
pr <- dff[,5]
# Smoothing with loess()
t = 1:266
loepr <- loess(pr~t, degree=2, span = 0.02)
fitpr <- predict(loepr, t)
loecs <- loess(test~t, degree=2, span = 0.02)
fitcs <- predict(loecs, t)
#Plots
plot(test, type = "h", col = "pink", main = "Daily tests performed",
xlab = "Days")
lines(t, fitcs/1.5, col = "darkgreen", lwd = 2)
plot(pr, type = "h", col = "lightblue", main = "Daily PR",
xlab = "Days")
lines(t, fitpr/1.5, col = "darkgreen", lwd = 2)
While it was not obvious when the data smoothed with a 7-day moving average, the daily test data now show clear drops in Victoria Day (May 18), Thanksgiving (Oct 12), Christmas (Dec 25), and New Year eve (Jan 1, 2021). We know that lower test numbers could translate into lower case numbers. This sort of bias in the case numbers are well reported in the literature. That’s why using PR is important. On the other hand, the positive test results calculated by (tests x PR) are different than officially reported case numbers for Toronto. This could be resulted by false positives.
# Adding mobility data
TorontoMob <- read_dta("~/Dropbox/RNS2/DataON/Rinda/TorontoMob.dta")
tordf <- data.frame(pr = dff$dpr, mob = TorontoMob$all_day_bing_tiles_visited_relat[62:327])
# Rolling correlation
lag = 1:21
w = 7
mco <- c()
lagg <- c()
mdf <- tordf
n <- nrow(mdf)-w
rol <- c()
for(j in 1:n){
co <-c()
for(i in 1:length(lag)) {
mdf1 <- mdf %>% mutate(dprL = lead(pr, lag[i]))
mdf2 <- mdf1[complete.cases(mdf1), ]
k <- j-1
mdf3 <- mdf2[j:(w+k), ]
co[i] <- cor(mdf3[,2], mdf3[,3])
}
mco[j] <- max(co, na.rm = TRUE)
lagg[j] <- lag[which.max(co)]
}
roll <- mco[-length(mco)]
laga <- lagg[-length(lagg)]
loerc <- loess(roll~c(1:length(roll)), degree=2, span = 0.05)
fitrc <- predict(loerc, 1:length(roll))
plot(roll[100:length(roll)], type = "h", lwd = 2, col = "pink",
main = "Rolling correlation with tuned delay and 7-week window",
xlab = "Days", ylab = "Correlation",
ylim = c(0, 1.1))
lines(fitrc[100:length(roll)], col = "red", lwd = 2)
lines(fitpr[100:length(roll)]/12, col = "darkblue", lwd = 2)
abline(a = mean(roll[100:length(roll)]), b = 0, type = "n")
text(50, 1 , paste0("mean corr = ", mean(roll[100:length(roll)])))
Now we look at how many days of delay between mobility and PR in each 7-day rolling window is identified by the algorithm.
loel <- loess(laga~c(1:length(laga)), degree=2, span = 0.05)
fitl <- predict(loel, 1:length(laga))
plot(laga[100:length(laga)], type = "h", lwd = 2, col = "lightblue",
main = "Average delay in the effect of mobility restrictions on the spread",
xlab = "Days", ylab = "Mobility Delays - Day",
ylim = c(1, 21))
lines(fitl[100:length(laga)], col = "red", lwd = 2)
lines(fitpr[100:length(roll)]*1.6, col = "darkblue", lwd = 2)
abline(a = mean(laga[100:length(laga)]), b = 0)
text(30, 20 , paste0("mean = ", round(mean(laga[100:length(laga)]),2),
" days"))
In our recent paper, Dynamics of Social Mobility during the COVID-19 Pandemic in Canada, we start with the following abstract:
As the number of cases increases globally, governments and authorities have continued to use mobility restrictions that were, and still are, the only effective tool to control for the viral transmission. Yet, the relationship between public orders and behavioral parameters of social distancing observed in the community is a complex process and an important policy question. The evidence shows that adherence to public orders about the social distancing is not stable and fluctuates with degree of spatial differences in information and the level of risk aversion. This study aims to uncover the behavioural parameters of change in mobility dynamics in major Canadian cities and questions the role of people’s beliefs about how contagious the disease is on the level of compliancy to public orders. Our findings reveal that the degree of social distancing under strict restrictions is bound by choice, which is affected by the departure of people’s beliefs from the public order about how severe the effects of disease are. Understanding the dynamics of social distancing thus helps reduce the growth rate of the number of infections, compared to that predicted by epidemiological models.
For the first time in history, we have now multiple mobility metrics at finer spatial and temporal scales. The question is very simple: do we have a significant and meaningful relationship between the transmission of viral infection and the geotemporal fluctuations in mobility? Although it looks like a simple question that can be answered with available data, there are fundamental challenges that make the question a very complex problem:
In this short work, we intend to show basic difficulties and possible solutions in capturing the underlying relationship between mobility restrictions and the COVID-19 spread. Our results show that:
A work by ML-Portal